{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Checking distinct plans\n",
    "This notebook checks how many distinct plans our ensembles generated for each of Texas, Utah, and North Carolina. We assume the repeated plans occur during the proposal step in the ReCom chain, when GerryChain combines two adjacent districts and happens to cut them in half back into their original shapes. For more information, please see: https://arxiv.org/abs/1911.05725"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "99687\n",
      "There are 313 duplicate plans from TX-SEN12\n"
     ]
    }
   ],
   "source": [
    "state_name = \"texas\"\n",
    "state_abbr = \"TX\"\n",
    "election_name = \"SEN12\"\n",
    "datadir = \"../outputs/\" + state_abbr + \"output/\"\n",
    "\n",
    "max_steps = 100000\n",
    "step_size = 10000\n",
    "\n",
    "ts = [x*step_size for x in range(1,int(max_steps/step_size)+1)]\n",
    "\n",
    "df = pd.DataFrame(columns = ['seats','mm','pg','vs','eg','ce'])\n",
    "\n",
    "for t in ts:\n",
    "    tempdf = pd.read_csv(datadir + state_name + election_name +\"_data\"+str(t)+\".csv\", delimiter=',')\n",
    "    df = pd.concat([df, tempdf], ignore_index=True)\n",
    "    \n",
    "no_dups = df.drop_duplicates()\n",
    "print(len(no_dups))\n",
    "print(\"There are \" + str(100000 - len(no_dups)) + \" duplicate plans from \" + state_abbr + \"-\" + election_name)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "99687\n",
      "There are 313 duplicate plans from NC-SEN16\n"
     ]
    }
   ],
   "source": [
    "state_name = \"northcarolina\"\n",
    "state_abbr = \"NC\"\n",
    "election_name = \"SEN16\"\n",
    "datadir = \"../outputs/\" + state_abbr + \"output/\"\n",
    "\n",
    "max_steps = 100000\n",
    "step_size = 10000\n",
    "\n",
    "ts = [x*step_size for x in range(1,int(max_steps/step_size)+1)]\n",
    "\n",
    "df = pd.DataFrame(columns = ['seats','mm','pg','vs','eg','ce'])\n",
    "\n",
    "for t in ts:\n",
    "    tempdf = pd.read_csv(datadir + state_name + election_name +\"_data\"+str(t)+\".csv\", delimiter=',')\n",
    "    df = pd.concat([df, tempdf], ignore_index=True)\n",
    "    \n",
    "no_dups = df.drop_duplicates()\n",
    "print(len(no_dups))\n",
    "print(\"There are \" + str(100000 - len(no_dups)) + \" duplicate plans from \" + state_abbr + \"-\" + election_name)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As far as we can tell, it's just a coincidence(!) that there are exactly as many duplicate plans in the Texas run as in the North Carolina run..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "99863\n",
      "There are 137 duplicate plans from UT-SEN16\n"
     ]
    }
   ],
   "source": [
    "state_name = \"utah\"\n",
    "state_abbr = \"UT\"\n",
    "election_name = \"SEN16\"\n",
    "datadir = \"../outputs/\" + state_abbr + \"output/\"\n",
    "\n",
    "max_steps = 100000\n",
    "step_size = 10000\n",
    "\n",
    "ts = [x*step_size for x in range(1,int(max_steps/step_size)+1)]\n",
    "\n",
    "df = pd.DataFrame(columns = ['seats','mm','pg','vs','eg','ce'])\n",
    "\n",
    "for t in ts:\n",
    "    tempdf = pd.read_csv(datadir + state_name + election_name +\"_data\"+str(t)+\".csv\", delimiter=',')\n",
    "    df = pd.concat([df, tempdf], ignore_index=True)\n",
    "    \n",
    "no_dups = df.drop_duplicates()\n",
    "print(len(no_dups))\n",
    "print(\"There are \" + str(100000 - len(no_dups)) + \" duplicate plans from \" + state_abbr + \"-\" + election_name)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
